INTRODUCTION

The goal of our project was to better understand the Miami 311 data set categories Animal Bites to a Person and Pitbull Investigations through visualization techniques using R and Geographical Information System (GIS) software.

Our first thoughts were to analyze the variables Goal Days versus Actual Days Completed. However, the complexity of the data set made this a challanging task. After carefull study of the levels in issue.type, the variable was narrowed to include only Animal Bite To A Person and Pit Bite Investigation. Initially, we thought there might be a relationship between the amount of animal bites and pit bull investigation. For example, an area with a high level of animal bites would have a cluster of pit bull investigation. However, in the first stages of exploratory analysis the idea proved to be unfruitful and we opted to explore the data through visualization.

IMPORTANCE OF TOPIC

Pitbulls are veiwed as an agressive breed that pose a danger to humans and other animals. Miami Dade County has legislature in place that puts specific restrictions on pitbulls. If the two categories of issue.type were correlated, we would better understand the implementation of pitbull restrictions.

The Miami Dade County Ordanance states pitbulls should be confined indoors or outdoors because pitbulls are naturally inclined to attack humans and other animals. In addition, the owner should have a “Dangerous Dog” sign posted. If the owners fail to comply with these rules the dogs will be muzzled to prevent bites and injuries to others. They are also to be kept on a leash. Exception to these rules are for dogs participating in dog shows, contests, or hunting.

LOADING AND EXAMINING THE DATA

To begin exploring Miami 311 Data we downloaded the csv file from the Miami Dade County website ( https://opendata.miamidade.gov/311/311-Service-Requests-Miami-Dade-County/dj6j-qg5t ) and saved it to our working directories. Next, we imported the csv file to an R object called df. Downloading and importing should take some time because the dataset contains over 641,000 rows. After filtering for our selected variables, there were over 13,000 observations.

#df <- read.csv("311_Service_Requests_-_Miami-Dade_County.csv")
#head(df) # much too long to display in document 
#str(df)  # much too long to display in document
#names(df)

Calling str(df) displays the variables, types of variables, and the first few entries per row. This data frame contains a mixture of categorical and numerical variables. Categorical variables are usually indicated by Factor: w/ levels. In addition, calling names(df) will display the column names.This data frame consists of 23 columns.

CLEANING THE DATA

For the purpose of exploring the data we drew a random sample of size 50 from df. We called this df2. We set replace = FALSE so no entries were repeated. Then we saved it to a csv file titled “df2.csv”.

library(dplyr)
#df2 <- sample_n(df, size = 50, replace = FALSE)
# save as csv
 #write.csv(df2, "df2.csv")

Using the dplyr package, we selected the columns needed for analysis from df2. We had no use for the columns titled: Ticket.Created.Date…Time, Ticket.Last.Updated.Date…Time, Ticket.Closed.Date…Time , Ticket.Status, X.Coordinate, and Y.Coordinate, among others. See the code below for what other columns we removed from the data frame. We saved the selected columns to a new R object. We called it cdf for clean data frame. The dplyr package is handy for cleaning because of its functions like select() and filter(). The pipe operator %>% is also useful to perform various tasks at once. We shortened the original column names and set them all to lower case. Using the functions fix_year(), fix_month(), and convert_month() from the Miami311p package, we changed the character strings in the created column into two additional columns of year and month. Then we removed the created column. Lastly, we decided it best to save data at every stage incase we need to retrace our steps and therefore, created a new csv file with the clean data.

df2 <- read.csv("df2.csv")
cdf<- df2 %>% select("Ticket.ID", "Issue.Type", "City", "Neighborhood...District...Ward...etc.", "Created.Year.Month", "Longitude", "Latitude", "Method.Received", "Goal.Days", "Actual.Completed.Days")
colnames(cdf) <- c("id", "issue.type", "city", "district", "created","longitude", "latitude", "method", "goal.days", "actual.days")

library(Miami311p)
#create vectors of months and years
year <- fix_year(cdf$created)
##  [1] "2013" "2013" "2015" "2017" "2014" "2015" "2015" "2018" "2016" "2014"
## [11] "2013" "2015" "2017" "2016" "2015" "2015" "2015" "2015" "2017" "2017"
## [21] "2014" "2016" "2015" "2017" "2017" "2016" "2014" "2014" "2015" "2013"
## [31] "2015" "2014" "2015" "2017" "2015" "2013" "2015" "2017" "2014" "2017"
## [41] "2017" "2014" "2017" "2016" "2017" "2016" "2014" "2015" "2015" "2017"
month <- fix_month(cdf$created)
##  [1] 6  7  6  1  4  1  12 1  6  2  11 8  5  10 11 1  3  8  6  1  4  4  10
## [24] 7  11 1  3  1  10 8  5  9  12 8  6  11 7  2  7  12 11 2  8  5  8  5 
## [47] 8  3  11 8
month <- convert_month(month) 
##  [1] June      July      June      January   April     January   December 
##  [8] January   June      February  November  August    May       October  
## [15] November  January   March     August    June      January   April    
## [22] April     October   July      November  January   March     January  
## [29] October   August    May       September December  August    June     
## [36] November  July      February  July      December  November  February 
## [43] August    May       August    May       August    March     November 
## [50] August
# bind month and years to clean data frame
cdf$year <- factor(year)
cdf$month <- factor(month)

#check structure
str(cdf)
## 'data.frame':    50 obs. of  12 variables:
##  $ id         : Factor w/ 50 levels "13-10037389",..: 1 2 20 37 10 16 30 50 35 7 ...
##  $ issue.type : Factor w/ 30 levels "ABANDONED PROPERTY / VEHICLE",..: 17 13 29 8 27 30 20 10 19 6 ...
##  $ city       : Factor w/ 9 levels "City_of_Hialeah",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ district   : Factor w/ 13 levels "District 1","District 10",..: 2 10 2 13 6 3 3 13 2 13 ...
##  $ created    : int  20136 20137 20156 20171 20144 20151 201512 20181 20166 20142 ...
##  $ longitude  : num  -80.4 -80.3 -80.3 -80.4 -80.2 ...
##  $ latitude   : num  25.7 25.7 25.7 25.6 25.9 ...
##  $ method     : Factor w/ 9 levels "EMAIL","INHOUSE",..: 9 9 5 5 5 5 9 9 5 9 ...
##  $ goal.days  : int  90 90 10 4 30 30 30 180 30 120 ...
##  $ actual.days: int  155 16 1 3 3 32 0 NA 0 1 ...
##  $ year       : Factor w/ 6 levels "2013","2014",..: 1 1 3 5 2 3 3 6 4 2 ...
##  $ month      : Factor w/ 12 levels "April","August",..: 7 6 7 5 1 5 3 5 7 4 ...
# check to for column position of "created" and remove it and reassign cdf 
names(cdf)
##  [1] "id"          "issue.type"  "city"        "district"    "created"    
##  [6] "longitude"   "latitude"    "method"      "goal.days"   "actual.days"
## [11] "year"        "month"
cdf<- cdf[ , -5]
names(cdf)
##  [1] "id"          "issue.type"  "city"        "district"    "longitude"  
##  [6] "latitude"    "method"      "goal.days"   "actual.days" "year"       
## [11] "month"
# examine new product
head(cdf , 3)
##            id                             issue.type              city
## 1 13-10037389             RIGHT OF WAY - MAINTENANCE Miami_Dade_County
## 2 13-10056138            JUNK AND TRASH / OVERGROWTH Miami_Dade_County
## 3 15-10188131 VISUAL OBSTRUCTION SAFETY ISSUE (RAAM) Miami_Dade_County
##      district longitude latitude   method goal.days actual.days year month
## 1 District 10 -80.40758 25.70405 XTERFACE        90         155 2013  June
## 2  District 6 -80.29507 25.74856 XTERFACE        90          16 2013  July
## 3 District 10 -80.34385 25.69773    PHONE        10           1 2015  June
# save to csv
# write.csv(cdf, "Clean df2.csv")

EXPLORING IN EXCEL

Once the data was cleaned we needed to narrow our scope of analysis. After calling str(cdf) we noticed that the city column had 37 levels, district had 14 levels, and issue.type, our main variable for analysis, had 205 levels. We imported the cleaned data set into Excel to take a closer look. Excell allowed us to look at the data set all at once and make use of its filter function that allowed us to examine the levels of each categorical variable. The picture below demonstrates the complexity of the data. There are sublevels within levels in the variable issue.type. For example there is the top level “Traffic” with multiple sublevels such as “Signal Ped Crossing Time Too Short” and “Sign Down Damaged Faded Missing (Other Than Control Sign)”.

Example of Excel Filter Function

Example of Excel Filter Function

VISUALIZATIONS

  1. Barchart: “Pit Bull Investigation” & “Animal Bite To A Person” by year
  2. GIS: Images of “Pit Bull Investigation” and “Animal Bite To A Person”
  3. Mapping in R
  4. Barchart: Frequency of cities; category “Pit Bull Investigations”
  5. Barchart: Frequency of cities; category “Animal Bite To A Person”

The following sections use a data set subsetted from df using Excel. The column names have not been changed like in the initial cleaning presented above.

CREATING A BARCHART: CHART 1

“Pit Bull Investigation” and “Animal Bite to A Person” (2013 - 2017)

library(ggplot2)
## Need help? Try the ggplot2 mailing list:
## http://groups.google.com/group/ggplot2.

Read in the data into an R object.

pb<- read.csv("C:/Users/pietr/Desktop/Data/311 Bites and Pits.csv")

Create a plot using the ggplot2 package. The base of plotting in ggplot is always ggplot(). Within the aes() argument include x axis and the y axis will be the count. In this case, x = Created.Year.Month. We want to add color based on the Issue Type so we put that as out fill argument.

ggplot(pb, aes(`Created.Year.Month`, fill = `Issue.Type`)) + 
  # dodge will unstack the bars and put them side by side
  geom_bar(position = "dodge")+ 
  # x axis title 
  xlab('Year')+
  # y axis title
  ylab("Count")+
  # title of graph
  ggtitle("Animal Bite to a Person and Pit Bull Investigations 2013 - 2017")+
  # add a legend: name = "title of legend", values = c("colors", "of", "legend"))
  scale_fill_manual(name = "Issue Type", values = c("rosybrown3", "cornflowerblue"))+
  # remove the default grey background
  theme_minimal()+
  #change legend position on graph 
  theme(legend.position = "top")+
  #selects title of the plot, selects text of title, hjust = (side of graph 0 - 1)
  # hjust = 0.5, will center the title
  theme(plot.title = element_text(hjust = 0.5, size =15))

MAPPING IN R

setwd("C:/Users/pietr/Desktop/Data")
library(dplyr)
library(ggplot2)
library(devtools)
## Warning: package 'usethis' was built under R version 3.4.4
library(stringr)
library(maps)
#install.packages("mapdata")
library("mapdata")
## Warning: package 'mapdata' was built under R version 3.4.4
#install.packages("ggmap")
#library(ggmap)

# Create your data object 
pb<- read.csv("C:/Users/pietr/Desktop/Data/311 Bites and Pits.csv")

# Source: http://eriqande.github.io/rep-res-web/lectures/making-maps-with-R.html

# Create the Florida map 
states <- map_data("state")
fl_map <- subset(states, region=="florida")
head(fl_map)
##           long      lat group order  region subregion
## 1462 -85.01548 30.99702     9  1462 florida      <NA>
## 1463 -84.99829 30.96264     9  1463 florida      <NA>
## 1464 -84.97537 30.92253     9  1464 florida      <NA>
## 1465 -84.94672 30.89962     9  1465 florida      <NA>
## 1466 -84.94099 30.88815     9  1466 florida      <NA>
## 1467 -84.94672 30.85951     9  1467 florida      <NA>
# Get the counties 
counties <- map_data("county")
# Filter out the county of Miami-Dade
md <- counties %>% filter(subregion == "miami-dade")
md
# Plot Miami-Dade
md_map <-ggplot(md, mapping = aes(x= long, y= lat))+
  coord_fixed(1.3) +
  geom_polygon(color= "black", fill="cornsilk2")+
  xlab("Longitude")+
  ylab("Latitiude")+
  ggtitle("Miami Dade County\n2013-2017")+
  theme_dark()+
  theme(text = element_text(size = 15))
md_map

# Split the data to be plotted
set.seed(101)
pb2 <- sample_n(pb, size= 1000)
pb2$Issue.Type<- factor(pb2$Issue.Type)
pb2.split<- split(pb2, pb2$Issue.Type)
pb2b<- pb2.split$`ANIMAL BITE TO A PERSON`
pb2p<-pb2.split$`PIT BULL INVESTIGATION`

# Plot the data points by lattitude and longitude. 
# Panel by year, color by issue type
md_map2 <- md_map+ 
  geom_point(data = pb2b, aes(Longitude, Latitude, color= "rosybrown3"),alpha = 0.5)+
  geom_point(data = pb2p, aes(Longitude, Latitude, color= "cornflowerblue"), alpha = 0.5)+
  coord_fixed(1.3)+
  facet_grid(.~Created.Year.Month)+
  theme(plot.title = element_text(hjust = 0.5, size =30))+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  scale_colour_manual(name = 'Issue Type', guide = "legend",
                      values =c('rosybrown3'='rosybrown3','cornflowerblue'='cornflowerblue'), 
                      labels = c('ANIMAL BITE TO A PERSON','PIT BULL INVESTIGATION'))+
  theme(legend.position = "top", 
        legend.text.align = 0.5, 
        legend.text = element_text(size = 8), 
        legend.title = element_text(size= 12),
        legend.key=element_rect(fill = "white"))+
  guides(colour = guide_legend(title.position = "top", title.hjust = 0.5))

md_map2
## Warning: Removed 5 rows containing missing values (geom_point).

CREATING GIS IMAGES

“Animal Bite To A Person” and “Pit Bull Investigation”

The map created in R lacked redability. GIS provided a better way to display the spatial data. Below are some of the resulting graphics.

GIS Image of Animal Bites (white) and Pit Bull Investigations (black)

GIS Image of Animal Bites (white) and Pit Bull Investigations (black)

GIS Image of all Instances of Animal Bites and Pit Bull Investigations color coded by District

GIS Image of all Instances of Animal Bites and Pit Bull Investigations color coded by District

GIS Image of all Instances of Animal Bites and Pit Bull Investigations color coded by District

GIS Image of all Instances of Animal Bites and Pit Bull Investigations color coded by District

For this map, the plotted symbols are Animal Bites to a Person and Pitbull Investigations in 2017. The yellow symbols represents Animal Bites to a Person. They grey symbol is Pitbull Investigations. The heatmap also shows the concentration of incidents in various areas. GIS Image by Incident Type used to show which areas had a high concentration of pitbull and animal bites.

This map shows the concentration by district. The heatmap shows concentration per area. GIS Image by District used to show which areas had a high concentration of pitbull and animal bites.

The next two images below are a side by side comparison showing which areas had a high, medium, or low concentration of incidents.
GIS Image Animal Bites to a person (2013 - 2017).

GIS Image Pitbull Investigations (2013 - 2017).

GIS Image Pitbull Investigations (2013 - 2017).

CREATING A BARCHART: CHART 2

Frequency Barchart of “Pitbull Investigation” (2017)

Our team wanted to find the most frequent cities in Miami Dade county appearing among “Pit Bite Investigation” for the year 2017. [2]

Using the data set pb filter for the 2017 year. We saved it to the object pb2017.

pb2017<- pb %>% filter(`Created.Year.Month` == 2017) 

Since we wanted to plot the most frequent cities, we first plotted all the cities using the plotly package. Ploty has interactive graphs so, we could directly point and click to find values. This graph showed an overwhelming amount of occurances came from “Miami_Dade_County”.Due to this, we decided to remove the city from our dataset. In addition, “Miami_Dade_County” is not a city of the county itself. We belive it is given to cases that have undocumented cities or is a default value.

g2 <- ggplot(pb2017, aes(City, fill = `Issue.Type`)) + 
  geom_bar(position = "dodge") 
library(plotly)
ggplotly(g2)

Factor city. Split pb2017 by city. This creates a list of all the cities and the data that is attached to them. This allowed us to remove the city as an element of the list.

pb2017$City<- factor(pb2017$City)
pbsplit <- split(pb2017, pb2017$City)
tail(names(pbsplit),10)
##  [1] "Miami_Dade_County"           "Miami_Shores_Village"       
##  [3] "North_Bay_Village"           "Town_of_Bay_Harbor_Islands" 
##  [5] "Town_of_Cutler_Bay"          "Town_of_Medley"             
##  [7] "Village_of_Biscayne_Park"    "Village_of_El_Portal"       
##  [9] "Village_of_Key_Biscayne"     "Village_of_Virginia_Gardens"

Calling names(pbsplit) displayed the position of “Miami_Dade_County” in the list. The code below removes it.

pbsplit[[24]]<- NULL
tail(names(pbsplit), 10)
##  [1] "City_of_West_Miami"          "Miami_Shores_Village"       
##  [3] "North_Bay_Village"           "Town_of_Bay_Harbor_Islands" 
##  [5] "Town_of_Cutler_Bay"          "Town_of_Medley"             
##  [7] "Village_of_Biscayne_Park"    "Village_of_El_Portal"       
##  [9] "Village_of_Key_Biscayne"     "Village_of_Virginia_Gardens"

Combine the list back into a data frame. Character strings will be characterized as factors.

pbmerge <- do.call(rbind.data.frame, pbsplit)
# Just becasue we thought it important to know how much data we were missing
sum(is.na(pbmerge))
## [1] 6

Split the data frame by Issue Type. We called this object pbtype.split. Then assign the lists by Issue Type to bite.split for animal bites and pit.split for pit bull investigations.

pbtype.split<- split(pbmerge, pbmerge$Issue.Type)
bite.split<- pbtype.split$`ANIMAL BITE TO A PERSON`
pit.split<- pbtype.split$`PIT BULL INVESTIGATION`

Extract the city column from bite.split and pit.split. This is the text information we used to create the wordclouds. Save them to as tab deliminated file in two seperate empty folders in the working directory.

bite.city <- as.character(bite.split$City)
pit.city <- as.character(pit.split$City)
# save to txt file
#write.table(bite.city, "bite.txt", sep="\t")
#write.table(pit.city, "pit.txt", sep="\t")

Download the required packages.

library(tm)
library(wordcloud)

Create the corpus.

bite.text <- readLines("C:/Users/pietr/Desktop/Data/LIS 5802 Fianl Project bite word cloud/bite.txt")
bite.corpus <- Corpus(VectorSource(bite.text))

Clean the text. From inspecting the head of the bite corpus, there were many extra characters that we needed to remove such as “” and numbers. The toSpace function was taken from [2]. There is also other text cleaning code for patterns we thought we would need to remove but we chose not to because it sperated the phrases in the word cloud.

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
bite.corpus <- tm_map(bite.corpus, toSpace, "\t")
# bite.corpus <- tm_map(bite.corpus, toSpace, "_")
bite.corpus <- tm_map(bite.corpus, removeNumbers)
bite.corpus <- tm_map(bite.corpus, removePunctuation)
# bite.corpus <- tm_map(bite.corpus, content_transformer(tolower))

Create the term document matrix. The tm package does all the work here. We saved the matrix as bite.tdm

bite.tdm <- TermDocumentMatrix(bite.corpus)
bite.matrix <- as.matrix(bite.tdm)
bite.sort <- sort(rowSums(bite.matrix), decreasing = TRUE)
as.vector(bite.sort)
##  [1] 296 112  64  63  53  52  41  39  35  30  29  18  18  17  13  13  12
## [18]  12  12   9   8   5   5   5   4   3   3   3   1   1   1

Create a vector with the populations of each city represented in the term document matrix.

# Divide by population 
Frequency<- bite.sort
pop <- as.vector(c(453579, 236387, 44707, 45704, 60512, 107167, 46780, 
                 58786, 87779, 41523, 23410, 35762, 18223, 11245, 11657,
                 13499, 21744, 29361, 15219, 12344, 13809, 5965, 10493,
                 838, 20832, 5744, 7137, 5628, 3055, 2325, 2375))

Combine the term document matrix.

City <- names(bite.sort)
bite.chart <- print(cbind(City, Frequency,pop), quotes = FALSE)
bite.chart <- as.data.frame(bite.chart)

Extract the frequency and population.

str(bite.chart)
## 'data.frame':    31 obs. of  3 variables:
##  $ City     : Factor w/ 31 levels "cityofaventura",..: 8 5 26 3 7 10 2 13 9 14 ...
##   ..- attr(*, "names")= chr  "cityofmiami" "cityofhialeah" "townofcutlerbay" "cityofdoral" ...
##  $ Frequency: Factor w/ 21 levels "1","112","12",..: 8 2 19 18 17 16 14 12 11 10 ...
##   ..- attr(*, "names")= chr  "cityofmiami" "cityofhialeah" "townofcutlerbay" "cityofdoral" ...
##  $ pop      : Factor w/ 31 levels "10493","107167",..: 21 14 20 22 28 2 23 26 31 19 ...
##   ..- attr(*, "names")= chr  "cityofmiami" "cityofhialeah" "townofcutlerbay" "cityofdoral" ...
bite.chart$Frequency<- as.numeric(bite.chart$Frequency)
bite.chart$pop<- as.numeric(bite.chart$pop)
freq<- bite.chart$Frequency
n <- (freq/(pop))
bite.chart$n <- n 

attach(bite.chart)
## The following objects are masked _by_ .GlobalEnv:
## 
##     City, Frequency, n, pop
bite.chart2 <- bite.chart[order(-n),]

Fix frequency data for plotting. This pairs the names the cities with their frequencies.We subsetted the 5 highest cities denoted by *fc from the data.

f <- as.numeric(as.vector(bite.chart2[1:5,4]))
c <- as.vector(bite.chart2[1:5,1])
fc<- as.data.frame(cbind(f,c))
fc$f<-as.numeric(f)
fc$c <- c("Town of Medley", "City of West Miami", "Village of Key Biscayne", "Town of Bay Harbor Islands", "City of Surfside")

Plot the frequencies in a bar chart.

ggplot(fc, aes(c, f))+
  geom_bar(stat = "identity", fill="rosybrown3")+
  xlab("City")+
  ylab("Frequency per Population")+ 
  ggtitle("Cities with Highest Frequency of 311 Calls for Animal Bite to a Person for 2017")+
  theme_minimal()

CREATING A BARCHART: CHART 3

Frequency Barchart of Pitbull Investigations

# Create the corpus 
pit.text <- readLines("C:/Users/pietr/Desktop/Data/LIS 5802 Final Project pit word cloud/pit.txt")
pit.corpus <- Corpus(VectorSource(pit.text))
inspect(pit.corpus)

# Clean the text. Code taken from: 
pit.corpus <- tm_map(pit.corpus, toSpace, "\t")
pit.corpus <- tm_map(pit.corpus, removeNumbers)
pit.corpus <- tm_map(pit.corpus, removePunctuation)
#pit.corpus <- tm_map(pit.corpus, removeWords, c("CityofMiami")) # Remove after because of high population
inspect(pit.corpus)

# Term Document Matrix 
pit.tdm <- TermDocumentMatrix(pit.corpus)
pit.matrix <- as.matrix(pit.tdm)
pit.sort <- sort(rowSums(pit.matrix), decreasing = TRUE)
as.vector(pit.sort)

# Combine Term Document Matrix into a table
City <- names(pit.sort)
Frequency <- as.numeric(pit.sort[is.numeric(pit.sort)])
pit.chart <- print(cbind(City, Frequency), quotes = FALSE)
pit.chart <- as.data.frame(pit.chart)

# Account for population
Frequency<- pit.sort
pop <- as.vector(c(453579, 107167, 60512, 40286, 87779, 58786, 
                   58786, 41523, 29361, 11245, 15219, 46780, 5744, 
                   45704, 21744, 13809, 23410, 11657, 10493, 35762,
                   18223, 5965, 2325))

str(pit.chart)
pit.chart$Frequency<- as.numeric(pit.chart$Frequency)
pit.chart$pop<- as.numeric(pop)
freq<- pit.chart$Frequency
n <- (freq/(pop*100))
pit.chart$n <- n 

attach(pit.chart)
## The following objects are masked _by_ .GlobalEnv:
## 
##     City, Frequency, n, pop
## The following objects are masked from bite.chart:
## 
##     City, Frequency, n, pop
pit.chart2 <- pit.chart[order(-n),]

# Fix frequency Data for plotting
f <- as.numeric(as.vector(pit.chart[1:5,4]))
c <- as.vector(pit.chart[1:5,1])
fc<- as.data.frame(cbind(f,c))
fc$f<-as.numeric(f)
fc
fc$c <- c("City of Miami", "City of Miami Gardens", "City of Homestead", "City of Hialeah", "Town of Cutler Bay")
# Plot the frequencies 
ggplot(fc, aes(c, f))+
  geom_bar(stat = "identity", fill="cornflowerblue")+
  xlab("City")+
  ylab("Frequency per Population")+ 
  ggtitle("Cities with Highest Frequency of 311 Calls for Pit Bull Investigations for 2017")+
  theme_minimal()

LIMITATIONS

While completing the project we came face to face with many issues. Our main limitation was over complicating and getting in the data analysis. For example, the word clouds below were made but did not account for the population and although they provided another way of displaying the frequency data, they were not needed (see images below). Also, the description of the Animal Bite 311 calls is missing from the data set. There is no way to tell what animals are involved in the actually biting incident. If the description of the call was there, we could only use the ones that regarded pitbull bites and then compare their locations to those of the Pitbull Investigation variable.

Word Cloud of Animal Bite frequency before the population was accounted for (2017)

Word Cloud of Animal Bite frequency before the population was accounted for (2017)

Word Cloud for Pitbull Investigation frequency before population was accounted for (2017)

Word Cloud for Pitbull Investigation frequency before population was accounted for (2017)

FINAL RESULTS

Certain areas of Miami Dade County had a high concentration of Animal Bites to a Person calls but not Pitbull Investigation calls. As mentioned before, having the descriptions of the calls could have given more insight to the data.The pit bull ordinance could also be an underlying factor in understanding why there were lower concentrations in pitbull investigations in certain areas. In conclusion, while we were able to see which areas had a higher concentration of both incidents, we couldn’t further analyze the data or get a better understanding for why there were higher concentrations in some areas and not in others.

SOURCES

[1] http://eriqande.github.io/rep-res-web/lectures/making-maps-with-R.html [2] http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know